-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cephadm/services/cephadmservice: shutdown monitors before removing them #45511
Conversation
@kamoltat could you please add the same context you provided in https://tracker.ceph.com/issues/50089#note-11 to the commit message as well, so that reviewers understand the motivation behind this PR |
cmd = "sudo systemctl stop ceph-%s@%s" % (fsid, service_name) | ||
os.system(cmd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adk3798 I don't think this will actually work, currently, I don't think there is a MON or other ceph cli command that allows us to stop the monitor, there is only service ceph -a stop mon.{mon-id}
or using systemctl
. These are all executed through cmd and not ceph cli, do you mind guiding me to how I can execute cmd commands inside this part of cephadm? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd think a change like this to the way we shutdown the daemon belongs more in https://github.com/ceph/ceph/blob/master/src/cephadm/cephadm#L6446 than here in the mgr module. All daemon removals run through that code path. Although it looks like it already run systemctl stop. Either way, that's where you'd want to look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! will take a look
@neha-ojha sure thing! sorry sometimes I like to open PRs and change the descriptions and etc after |
17d4ba5
to
9e4f66c
Compare
84c608d
to
3b240b7
Compare
src/cephadm/cephadm
Outdated
|
||
# remove the mon from cluster after we stop the mon. | ||
if daemon_type == 'mon': | ||
call(ctx, ['ceph', 'mon', 'rm', daemon_id], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I get some clarification on what the exact order of operations we need is? We have a few things going on
- Stopping the mon (right above this)
- removing the mon from the quorum (this command here)
- disabling and fully removing the mon (next few lines after this)
What are the requirements in terms of ordering here? If it's just that we need to stop the mon before removing it from the quorum, you could consider just moving the mon rm
command you removed from pre-remove into post-remove which will run after rm_daemon (thus making it the last of the 3 operations to happen). If it is necessary to have this exact order where we stop the mon, then remove from quorum, then disable and fully remove the mon, you could leave it here. However, this would need to be done as a shell command (shell command basically starts up a container to run the command in) rather than just calling call
(see https://github.com/ceph/ceph/blob/master/src/cephadm/cephadm#L3640) as there is no guarantee that the ceph packages are installed directly on the host (which is where call
will run). This could be a bit tricky to setup correctly, which is why I want to verify that we NEED this to be done in the middle here and not just after the the mon is stopped.
1/1 passed, evaluated by manually checking the monitor leader log to see if there are no crashes |
3b240b7
to
08feede
Compare
4f4fcbe
to
587d8bf
Compare
d800aa1
to
e6a68d7
Compare
@adk3798 Forgot to rebase in my local and forced-push to here, so had to copy-paste your work into my new commit, just letting you know in case you wonder what happened (sorry, didn't mean to steal your work lol). |
Make check fail, need 2 blank lines :(
|
e6a68d7
to
96401ed
Compare
cc637e0
to
f9f59c2
Compare
f9f59c2
to
493f8b3
Compare
1 infra failure (the dead job), 1 issue with test_repos that broke a few days ago and is unrelated to PRs in the run |
# remove mon from quorum before we destroy the daemon | ||
logger.info('Removing monitor %s from monmap...' % daemon_id) | ||
ret, out, err = self.mgr.check_mon_command({ | ||
'prefix': 'mon rm', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still getting health warnings whenever I scale down mons at all with this patch.
[ceph: root@vm-00 /]# ceph orch ls
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
alertmanager ?:9093,9094 1/1 47s ago 6m count:1
crash 3/3 49s ago 6m *
grafana ?:3000 1/1 47s ago 6m count:1
mgr 2/2 49s ago 7m count:2
mon 3/5 49s ago 7m count:5
node-exporter ?:9100 3/3 49s ago 6m *
osd.all-available-devices 12 49s ago 4m *
prometheus ?:9095 1/1 49s ago 6m count:1
[ceph: root@vm-00 /]# ceph orch ps
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
alertmanager.vm-00 vm-00 *:9093,9094 running (70s) 51s ago 6m 15.9M - ba2b418f427c 9f6e62f1c9d1
crash.vm-00 vm-00 running (6m) 51s ago 6m 6979k - 17.0.0-13734-g96abb011 f4b280ba1496 966c1dfc7d74
crash.vm-01 vm-01 running (4m) 53s ago 4m 7159k - 17.0.0-13734-g96abb011 f4b280ba1496 c6b54137989f
crash.vm-02 vm-02 running (4m) 53s ago 4m 7180k - 17.0.0-13734-g96abb011 f4b280ba1496 b9d9d21e028e
grafana.vm-00 vm-00 *:3000 running (66s) 51s ago 6m 40.8M - 8.3.5 dad864ee21e9 3b91c5d4cc78
mgr.vm-00.bdwsut vm-00 *:9283 running (7m) 51s ago 7m 479M - 17.0.0-13734-g96abb011 f4b280ba1496 06aec7958775
mgr.vm-02.haworo vm-02 *:8443,9283 running (4m) 53s ago 4m 428M - 17.0.0-13734-g96abb011 f4b280ba1496 5fd8cceb7c0d
mon.vm-00 vm-00 running (7m) 51s ago 7m 56.5M 2048M 17.0.0-13734-g96abb011 f4b280ba1496 7d09db9b0721
mon.vm-01 vm-01 running (4m) 53s ago 4m 48.0M 2048M 17.0.0-13734-g96abb011 f4b280ba1496 ea18ee80f985
mon.vm-02 vm-02 running (4m) 53s ago 4m 52.6M 2048M 17.0.0-13734-g96abb011 f4b280ba1496 30934b1e44e1
node-exporter.vm-00 vm-00 *:9100 running (6m) 51s ago 6m 19.5M - 1dbe0e931976 ebdb46f6cffe
node-exporter.vm-01 vm-01 *:9100 running (4m) 53s ago 4m 24.0M - 1dbe0e931976 2bc17a3379c4
node-exporter.vm-02 vm-02 *:9100 running (4m) 53s ago 4m 22.8M - 1dbe0e931976 1838354f8634
osd.0 vm-01 running (3m) 53s ago 3m 71.6M 6203M 17.0.0-13734-g96abb011 f4b280ba1496 45d1ec388ca2
osd.1 vm-02 running (3m) 53s ago 3m 71.9M 5435M 17.0.0-13734-g96abb011 f4b280ba1496 e8829f51624e
osd.2 vm-00 running (2m) 51s ago 2m 68.1M 4923M 17.0.0-13734-g96abb011 f4b280ba1496 6340cd1be0d5
osd.3 vm-01 running (2m) 53s ago 2m 70.8M 6203M 17.0.0-13734-g96abb011 f4b280ba1496 ad77363d8a7f
osd.4 vm-02 running (2m) 53s ago 2m 68.7M 5435M 17.0.0-13734-g96abb011 f4b280ba1496 9219b47952b9
osd.5 vm-00 running (2m) 51s ago 2m 70.2M 4923M 17.0.0-13734-g96abb011 f4b280ba1496 14d5661c5bab
osd.6 vm-02 running (110s) 53s ago 2m 68.0M 5435M 17.0.0-13734-g96abb011 f4b280ba1496 427652c89872
osd.7 vm-02 running (88s) 53s ago 100s 69.0M 5435M 17.0.0-13734-g96abb011 f4b280ba1496 f41b405ce4ca
osd.8 vm-01 running (2m) 53s ago 2m 68.0M 6203M 17.0.0-13734-g96abb011 f4b280ba1496 3662aca0f34a
osd.9 vm-00 running (2m) 51s ago 2m 67.4M 4923M 17.0.0-13734-g96abb011 f4b280ba1496 a3c94dc82ebf
osd.10 vm-01 running (2m) 53s ago 2m 67.2M 6203M 17.0.0-13734-g96abb011 f4b280ba1496 30a0e569b975
osd.11 vm-00 running (3m) 51s ago 3m 72.0M 4923M 17.0.0-13734-g96abb011 f4b280ba1496 fc1ab3e586c3
prometheus.vm-01 vm-01 *:9095 running (4m) 53s ago 4m 59.5M - 514e6a882f6e 640fcf0a4ee7
[ceph: root@vm-00 /]# ceph health detail
HEALTH_OK
[ceph: root@vm-00 /]# ceph orch apply mon 2
Scheduled mon update...
[ceph: root@vm-00 /]# ceph orch ls
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
alertmanager ?:9093,9094 1/1 97s ago 9m count:1
crash 3/3 97s ago 9m *
grafana ?:3000 1/1 97s ago 9m count:1
mgr 2/2 97s ago 9m count:2
mon 2/2 97s ago 103s count:2
node-exporter ?:9100 3/3 97s ago 9m *
osd.all-available-devices 12 97s ago 7m *
prometheus ?:9095 1/1 32s ago 9m count:1
[ceph: root@vm-00 /]# ceph orch ps
NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
alertmanager.vm-00 vm-00 *:9093,9094 running (2m) 9s ago 7m 23.1M - ba2b418f427c 9f6e62f1c9d1
crash.vm-00 vm-00 running (7m) 9s ago 7m 6979k - 17.0.0-13734-g96abb011 f4b280ba1496 966c1dfc7d74
crash.vm-01 vm-01 running (5m) 11s ago 5m 7159k - 17.0.0-13734-g96abb011 f4b280ba1496 c6b54137989f
crash.vm-02 vm-02 running (5m) 11s ago 5m 7180k - 17.0.0-13734-g96abb011 f4b280ba1496 b9d9d21e028e
grafana.vm-00 vm-00 *:3000 running (2m) 9s ago 7m 44.9M - 8.3.5 dad864ee21e9 3b91c5d4cc78
mgr.vm-00.bdwsut vm-00 *:9283 running (8m) 9s ago 8m 487M - 17.0.0-13734-g96abb011 f4b280ba1496 06aec7958775
mgr.vm-02.haworo vm-02 *:8443,9283 running (5m) 11s ago 5m 428M - 17.0.0-13734-g96abb011 f4b280ba1496 5fd8cceb7c0d
mon.vm-00 vm-00 running (8m) 9s ago 8m 59.7M 2048M 17.0.0-13734-g96abb011 f4b280ba1496 7d09db9b0721
mon.vm-01 vm-01 running (5m) 11s ago 5m 50.7M 2048M 17.0.0-13734-g96abb011 f4b280ba1496 ea18ee80f985
node-exporter.vm-00 vm-00 *:9100 running (7m) 9s ago 7m 19.6M - 1dbe0e931976 ebdb46f6cffe
node-exporter.vm-01 vm-01 *:9100 running (5m) 11s ago 5m 21.4M - 1dbe0e931976 2bc17a3379c4
node-exporter.vm-02 vm-02 *:9100 running (5m) 11s ago 5m 24.5M - 1dbe0e931976 1838354f8634
osd.0 vm-01 running (4m) 11s ago 4m 72.9M 6203M 17.0.0-13734-g96abb011 f4b280ba1496 45d1ec388ca2
osd.1 vm-02 running (4m) 11s ago 4m 73.2M 5435M 17.0.0-13734-g96abb011 f4b280ba1496 e8829f51624e
osd.2 vm-00 running (3m) 9s ago 3m 69.0M 4923M 17.0.0-13734-g96abb011 f4b280ba1496 6340cd1be0d5
osd.3 vm-01 running (3m) 11s ago 3m 71.7M 6203M 17.0.0-13734-g96abb011 f4b280ba1496 ad77363d8a7f
osd.4 vm-02 running (3m) 11s ago 3m 69.8M 5435M 17.0.0-13734-g96abb011 f4b280ba1496 9219b47952b9
osd.5 vm-00 running (3m) 9s ago 3m 71.1M 4923M 17.0.0-13734-g96abb011 f4b280ba1496 14d5661c5bab
osd.6 vm-02 running (2m) 11s ago 3m 68.9M 5435M 17.0.0-13734-g96abb011 f4b280ba1496 427652c89872
osd.7 vm-02 running (2m) 11s ago 2m 70.0M 5435M 17.0.0-13734-g96abb011 f4b280ba1496 f41b405ce4ca
osd.8 vm-01 running (3m) 11s ago 3m 69.0M 6203M 17.0.0-13734-g96abb011 f4b280ba1496 3662aca0f34a
osd.9 vm-00 running (3m) 9s ago 3m 68.4M 4923M 17.0.0-13734-g96abb011 f4b280ba1496 a3c94dc82ebf
osd.10 vm-01 running (3m) 11s ago 3m 68.5M 6203M 17.0.0-13734-g96abb011 f4b280ba1496 30a0e569b975
osd.11 vm-00 running (4m) 9s ago 4m 73.4M 4923M 17.0.0-13734-g96abb011 f4b280ba1496 fc1ab3e586c3
prometheus.vm-01 vm-01 *:9095 running (5m) 11s ago 5m 64.2M - 514e6a882f6e 640fcf0a4ee7
[ceph: root@vm-00 /]# ceph health detail
HEALTH_WARN 1/3 mons down, quorum vm-00,vm-01
[WRN] MON_DOWN: 1/3 mons down, quorum vm-00,vm-01
mon.vm-02 (rank 1) addr [v2:192.168.122.121:3300/0,v1:192.168.122.121:6789/0] is down (out of quorum)
@kamoltat is it just me getting this? When you scale down you don't get any warning?
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
493f8b3
to
347518f
Compare
Motivation behind this PR is from the tracker: https://tracker.ceph.com/issues/50089 `ceph orch apply mon <num>` command removes the monitor from the cluster before shutting the monitor down has created issues in the mon/elector.cc part of ceph. The design of the elector class assumes that we shut the monitor down before removing them out of the cluster. Therefore, this should be enforced everywhere in ceph. Correct steps to manually removing the monitor: https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#removing-a-monitor-manual In the function src/cephadm/cephadm `command_rm_daemon()` if the daemon is a monitor, we remove it from the cluster through calling the `CephContainer` class, using the command: `mon remove <daemon id>`. This is strictly after we stopped the monitor daemon using `systemctl stop`. In cephadm/services/cephadmservice.py, function `pre_remove()` we omitted the part where we remove the monitor before destroying the daemon. fixes: https://tracker.ceph.com/issues/50089 Signed-off-by: Kamoltat <ksirivad@redhat.com>
347518f
to
bc72ae5
Compare
Added a test where we have 5 MONs on 5 different hosts and try to reduce the number of MONs from 5 to 3 using the command ``ceph orch apply mon 3``. Also,increasing the number of MONs from 3 to 5 using the command: ``ceph orch apply mon 5``. Evaluating the correctness of the commands and inspecting if there are crashes or unhealthy cluster states. This test was motivated by the bug: https://tracker.ceph.com/issues/50089 Signed-off-by: Kamoltat <ksirivad@redhat.com>
bc72ae5
to
8e0a046
Compare
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
(From Core Rados PR scrub) @adk3798 So the issue is solved by this PR already: #44993 This PR will only change the mechanism of |
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
Motivation behind this PR is from the tracker:
https://tracker.ceph.com/issues/50089
ceph orch apply mon <num>
command removesthe monitor from the cluster before shutting
the monitor down has created issues in the
mon/elector.cc part of ceph. The design of
the elector class assumes that we shut the monitor
down before removing them out of the cluster.
Therefore, this should be enforced everywhere
in ceph.
Correct steps to manually removing the monitor:
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#removing-a-monitor-manual
Moreover,
added a test where we have 5 MONs on
5 different hosts and try to reduce
the number of MONs from 5 to 3 using
the command
ceph orch apply mon 3
.Also, increasing the number of MONs from
3 to 5 using the command:
ceph orch apply mon 5
.Evaluating the correctness of the commands
and inspecting if there are crashes or unhealthy
cluster states.
This PR is blocked by:
https://tracker.ceph.com/issues/55695 - test won't pass because there is a chance monitor(s) are not removed.
Fixes: https://tracker.ceph.com/issues/50089
Signed-off-by: Kamoltat ksirivad@redhat.com
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows